Goto

Collaborating Authors

 frequency resolution


RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification

arXiv.org Artificial Intelligence

In this paper, we propose RFUAV as a new benchmark dataset for radio-frequency based (RF-based) unmanned aerial vehicle (UAV) identification and address the following challenges: Firstly, many existing datasets feature a restricted variety of drone types and insufficient volumes of raw data, which fail to meet the demands of practical applications. Secondly, existing datasets often lack raw data covering a broad range of signal-to-noise ratios (SNR), or do not provide tools for transforming raw data to different SNR levels. This limitation undermines the validity of model training and evaluation. Lastly, many existing datasets do not offer open-access evaluation tools, leading to a lack of unified evaluation standards in current research within this field. RFUAV comprises approximately 1.3 TB of raw frequency data collected from 37 distinct UAVs using the Universal Software Radio Peripheral (USRP) device in real-world environments. Through in-depth analysis of the RF data in RFUAV, we define a drone feature sequence called RF drone fingerprint, which aids in distinguishing drone signals. In addition to the dataset, RFUAV provides a baseline preprocessing method and model evaluation tools. Rigorous experiments demonstrate that these preprocessing methods achieve state-of-the-art (SOTA) performance using the provided evaluation tools. The RFUAV dataset and baseline implementation are publicly available at https://github.com/kitoweeknd/RFUAV/.


DiffNMR3: Advancing NMR Resolution Beyond Instrumental Limits

arXiv.org Artificial Intelligence

Nuclear Magnetic Resonance (NMR) spectroscopy is a crucial analytical technique used for molecular structure elucidation, with applications spanning chemistry, biology, materials science, and medicine. However, the frequency resolution of NMR spectra is limited by the "field strength" of the instrument. High-field NMR instruments provide high-resolution spectra but are prohibitively expensive, whereas lower-field instruments offer more accessible, but lower-resolution, results. This paper introduces an AI-driven approach that not only enhances the frequency resolution of NMR spectra through super-resolution techniques but also provides multi-scale functionality. By leveraging a diffusion model, our method can reconstruct high-field spectra from low-field NMR data, offering flexibility in generating spectra at varying magnetic field strengths. These reconstructions are comparable to those obtained from high-field instruments, enabling finer spectral details and improving molecular characterization. To date, our approach is one of the first to overcome the limitations of instrument field strength, achieving NMR super-resolution through AI. This cost-effective solution makes high-resolution analysis accessible to more researchers and industries, without the need for multimillion-dollar equipment.


Blind Separation of Vibration Sources using Deep Learning and Deconvolution

arXiv.org Artificial Intelligence

Vibrations of rotating machinery primarily originate from two sources, both of which are distorted by the machine's transfer function on their way to the sensor: the dominant gear-related vibrations and a low-energy signal linked to bearing faults. The proposed method facilitates the blind separation of vibration sources, eliminating the need for any information about the monitored equipment or external measurements. This method estimates both sources in two stages: initially, the gear signal is isolated using a dilated CNN, followed by the estimation of the bearing fault signal using the squared log envelope of the residual. The effect of the transfer function is removed from both sources using a novel whitening-based deconvolution method (WBD). Both simulation and experimental results demonstrate the method's ability to detect bearing failures early when no additional information is available. This study considers both local and distributed bearing faults, assuming that the vibrations are recorded under stable operating conditions.


Differentiable short-time Fourier transform with respect to the hop length

arXiv.org Artificial Intelligence

The short-time Fourier transform (STFT) is a frequently used tool for analyzing non-stationary digital signals in various fields including audio Stafford et al. [1998], medicine Huang et al. [2019], and vibration analysis Leclรจre et al. [2016]. Spectrograms, which are obtained from the STFT magnitude, are essential for visualizing, understanding, and processing non-stationary signals in time-frequency representation. The STFT parameters, including tapering function, window length, and hop length, are critical and dependent on the application and signal characteristics. The tapering function balances frequency resolution and spectral leakage, with a narrower main lobe providing better frequency resolution at the expense of increased spectral leakage, and a wider main lobe reducing spectral leakage but decreasing frequency resolution. The Hann or Hamming window is a common starting point, but the best choice depends on the application's specific requirements. Actually, most studies on STFT parameters have focused on the choice of the window length, as it determines the time-frequency resolution trade-off. A shorter window length provides better time resolution but poor frequency resolution. Conversely, a longer window length provides better frequency resolution but poor time resolution. To provide more precise control over temporal and frequency resolution based on the local characteristics of the input signal, researchers have proposed using variable-length windows.


Differentiable adaptive short-time Fourier transform with respect to the window length

arXiv.org Artificial Intelligence

Fourier theory is a crucial aspect of signal processing, widely used in science and engineering. The short-time Fourier transform (STFT), also known as the windowed Fourier transform, plays a vital role in analyzing non-stationary signals with time-varying spectral content. Spectrograms, derived from the STFT magnitude, are commonly used for visualizing and processing non-stationary signals. The STFT window length is a critical parameter that determines the trade-off between temporal and frequency resolution, and several post-processing techniques have been developed to improve spectrogram readability, including synchrosqueezing Thakur et al. [2013] and reassignment Auger and Flandrin [1995]. Some researchers have proposed finding the optimal window length based on a given criterion Meignen et al. [2020], Jablonski and Dziedziech [2022], while others have recently proposed a differentiable version of STFT with respect to the window lengthLeiber et al. [2022a,b], Zhao et al. [2021], allowing for the optimization of the criterion using a gradient descent algorithm instead of grid search. Actually, the best window length depends on the signal itself and more particularly on its frequency content. It must therefore adapt to the time-varying spectral structure of the signal. Enhanced versions of STFT are then proposed to set the window length according to the local characteristics of the input signal.


EEG-NeXt: A Modernized ConvNet for The Classification of Cognitive Activity from EEG

arXiv.org Artificial Intelligence

One of the main challenges in electroencephalogram (EEG) based brain-computer interface (BCI) systems is learning the subject/session invariant features to classify cognitive activities within an end-to-end discriminative setting. We propose a novel end-to-end machine learning pipeline, EEG-NeXt, which facilitates transfer learning by: i) aligning the EEG trials from different subjects in the Euclidean-space, ii) tailoring the techniques of deep learning for the scalograms of EEG signals to capture better frequency localization for low-frequency, longer-duration events, and iii) utilizing pretrained ConvNeXt (a modernized ResNet architecture which supersedes state-of-the-art (SOTA) image classification models) as the backbone network via adaptive finetuning. On publicly available datasets (Physionet Sleep Cassette and BNCI2014001) we benchmark our method against SOTA via cross-subject validation and demonstrate improved accuracy in cognitive activity classification along with better generalizability across cohorts.


Understanding Information Processing in Human Brain by Interpreting Machine Learning Models

arXiv.org Artificial Intelligence

The thesis explores the role machine learning methods play in creating intuitive computational models of neural processing. Combined with interpretability techniques, machine learning could replace human modeler and shift the focus of human effort to extracting the knowledge from the ready-made models and articulating that knowledge into intuitive descroptions of reality. This perspective makes the case in favor of the larger role that exploratory and data-driven approach to computational neuroscience could play while coexisting alongside the traditional hypothesis-driven approach. We exemplify the proposed approach in the context of the knowledge representation taxonomy with three research projects that employ interpretability techniques on top of machine learning methods at three different levels of neural organization. The first study (Chapter 3) explores feature importance analysis of a random forest decoder trained on intracerebral recordings from 100 human subjects to identify spectrotemporal signatures that characterize local neural activity during the task of visual categorization. The second study (Chapter 4) employs representation similarity analysis to compare the neural responses of the areas along the ventral stream with the activations of the layers of a deep convolutional neural network. The third study (Chapter 5) proposes a method that allows test subjects to visually explore the state representation of their neural signal in real time. This is achieved by using a topology-preserving dimensionality reduction technique that allows to transform the neural data from the multidimensional representation used by the computer into a two-dimensional representation a human can grasp. The approach, the taxonomy, and the examples, present a strong case for the applicability of machine learning methods to automatic knowledge discovery in neuroscience.


Investigating Deep Neural Transformations for Spectrogram-based Musical Source Separation

arXiv.org Machine Learning

Musical Source Separation (MSS) is a signal processing task that tries to separate the mixed musical signal into each acoustic sound source, such as singing voice or drums. Recently many machine learning-based methods have been proposed for the MSS task, but there were no existing works that evaluate and directly compare various types of networks. In this paper, we aim to design a variety of neural transformation methods, including time-invariant methods, time-frequency methods, and mixtures of two different transformations. Our experiments provide abundant material for future works by comparing several transformation methods. We train our models on raw complex-valued STFT outputs and achieve state-of-the-art SDR performance on the MUSDB singing voice separation task by a large margin of 1.0 dB. 1 Introduction For a given mixed musical signal composed of several instrumental sounds, Musical Source Separation (MSS) is a signal processing task that tries to separate the mixture source into each acoustic sound source, such as singing voice or drums.


GANSynth: Adversarial Neural Audio Synthesis

arXiv.org Machine Learning

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence. Autoregressive models, such as WaveNet, model local structure at the expense of global latent structure and slow iterative sampling, while Generative Adversarial Networks (GANs), have global latent conditioning and efficient parallel sampling, but struggle to generate locally-coherent audio waveforms. Herein, we demonstrate that GANs can in fact generate high-fidelity and locally-coherent audio by modeling log magnitudes and instantaneous frequencies with sufficient frequency resolution in the spectral domain. Through extensive empirical investigations on the NSynth dataset, we demonstrate that GANs are able to outperform strong WaveNet baselines on automated and human evaluation metrics, and efficiently generate audio several orders of magnitude faster than their autoregressive counterparts.


Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform

arXiv.org Machine Learning

Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model. In this paper, we generalize the above framework and propose a training scheme for such models based on spectral amplitude and phase losses obtained by either STFT or continuous wavelet transform (CWT), or both of them. Since CWT is capable of having time and frequency resolutions different from those of STFT and is cable of considering those closer to human auditory scales, the proposed loss functions could provide complementary information on speech signals. Experimental results showed that it is possible to train a high-quality model by using the proposed CWT spectral loss and is as good as one using STFT-based loss.